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Abstract 


Recommender systems^ tool for predicting users’ potential prefer¬ 
ences by computing history data and users’ interests, show an increas¬ 
ing importance in various Internet applications such as online shopping. 
As a well-known recommendation method, neighbourhood-based col¬ 
laborative filtering has attracted considerable attention recently. The 
risk of revealing users’ private information during the process of fil¬ 
tering has attracted noticeable research interests. Among the current 
solutions, the probabilistic techniques have shown a powerful privacy 
preserving effect. The existing methods deploying probabilistic meth¬ 
ods are in three categories, one adds differential privacy noises 
in the covariance matrix; one introduces the randomisation in the 


neighbour selection process; the other 29 applies differential privacy 


in both the neighbour selection process and covariance matrix. When 
facing k Nearest Neighbour (fcNN) attack, all the existing methods 
provide no data utility guarantee, for the introduction of global ran¬ 
domness. In this paper, to overcome the problem of recommendation 
accuracy loss, we propose a novel approach, Partitioned Probabilistic 
Neighbour Selection, to ensure a required prediction accuracy while 
maintaining high security against fcNN attack. We define the sum of 
k neighbours’ similarity as the accuracy metric a, the number of user 
partitions, across which we select the k neighbours, as the security 
metric /3. We generalise the k Nearest Neighbour attack to [3k Nearest 
Neighbours attack. Differing from the existing approach that selects 
neighbours across the entire candidate list randomly, our method se¬ 
lects neighbours from each exclusive partition of size k with a decreas¬ 
ing probability. Theoretical and experimental analysis show that to 
provide an accuracy-assured recommendation, our Partitioned Proba¬ 
bilistic Neighbour Selection method yields a better trade-off between 
the recommendation accuracy and system security. 

Keywords: Privacy preserving, Differential privacy. Neighbourhood- 
based collaborative filtering recommender systems, Internet Commerce 
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1 Introduction 


Recommender systems predict users’ potential preferences by aggregating 
history data and users’ interests. Recently, an increasing importance of rec¬ 
ommender systems has been shown in various Internet applications. For 
example, Amazon has been receiving benefits for a decade from the recom¬ 
mender systems by providing personal recommendation to their customers, 
and Netflix posted a one million U.S. dollars award for improving their rec¬ 
ommender system to make their business more profitable [lO, 15,25 . Cur¬ 


rently, in recommender systems. Collaborative Filtering (CF) is a famous 
technology with three main popular techniques [^, i.e., neighbourhood- 
based methods [^, association rules based prediction [^, and matrix fac¬ 
torisation . Among these techniques, neighbourhood-based methods are 
the most widely used in the industry because of its easy implementation and 
high prediction accuracy. 

One of the most popular neighbourhood-based method is k Nearest 
Neighbour (/cNN) which provides recommendations by aggregating the opin¬ 
ions of a user’s k nearest neighbours [^. Although fcNN recommender sys¬ 
tems present very good performance of recommendation accuracy efficiently, 
the risk of revealing users’ private information during the process of filtering 
is still a growing concern, e.g., the /cNN attack presented by Calandrino et 
al. exploits the property that the users are more similar when sharing 
same rating on corresponding items to reveal user’s private data. Thus pre¬ 
senting an efficient privacy preserving neighbourhood-based CF algorithm 
against A:NN attack, which achieves a trade-off between the system security 
and recommendation accuracy, has been a natural research interest. 

The literature in CF recommender systems has developed several ap¬ 
proaches to preserve users’ privacy. Generally, cryptographic, obfuscation, 
perturbation, probabilistic methods and differential privacy are applied [2^ . 
Among them, cryptographic methods m Ti\ provide the most reliable se¬ 
curity but the unnecessary computational cost cannot be ignored. Obfus¬ 
cation methods 27 and Perturbation methods 


introduce designed 

random noise into the original matrix to preserve customers’ sensitive in¬ 
formation; however the magnitude of noise is hard to calibrate in these two 
types of methods [9 29. The probabilistic methods provided a similar¬ 
ity based weighted neighbour selection of the k nearest neighbours. Similar 
to perturbation, McSherry et al. 19 presented a naive differential privacy 


method which adds calibrated noise into the covariance (similarity between 
users/items) matrix. Similar to the probabilistic neighbour selection [^, 
Zhu et al. proposed a Private Neighbour Selection to preserve privacy 
against /cNN attack by introducing differential privacy in selecting the k 
nearest neighbours randomly (also adding noise into covariance matrix with 
differential privacy). Although the methods in [1,19,29 successfully pre¬ 


serve users’ privacy against /cNN attack, the low prediction accuracy due to 
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the global randomness should be noted. Even worse, failed to maintain 
differential privacy in the process of neighbour selection. Therefore, none 
of the existing privacy preserving CF recommender systems can provide 
enough utility while preserving users’ private information. 

Motivation. The current privacy preserving neighbourhood-based CF 
methods did not guarantee the data utility against A:NN attack. Therefore, 
in this paper, we aim to present a privacy preserving neighbourhood-based 
CF recommendation scheme which satisfies the following properties: 

(1) Easy implementation. 

(2) Absolutely keep differential privacy. 

(3) Significantly decrease the magnitude of noise in differential privacy. 

(4) Quantify the level of recommendation accuracy and system security. 

Actually, it is clear that the probabilistic methods (including naive prob¬ 
abilistic methods and differential privacy methods) are efficient methods 
against A:NN attack; however, because of the global noises, the neighbour 
quality, namely the prediction accuracy, is impacted significantly. Thus, to 
decrease the magnitude of differential privacy noise, we may propose the 
following approach: we can simply add Laplace noise to the final rating 
prediction after the normal /cNN CF recommendation. But Sarathy et al. 
has shown in [23] that the above method will release users’ privacy because 
Laplace mechanism does not work well in numeric data. So, to control the 
neighbour quality and to decrease the magnitude of noise, it is natural to 
avoid the global randomness and repeatedly adding noise. Therefore, we 
present a partitioned probabilistic neighbour selection method without any 
perturbations in the process of rating prediction. 

Contributions. In this paper, to overcome the problems of low recom¬ 
mendation accuracy, we propose a novel method, Partitioned Probabilistic 
Neighbour Selection. The main contributions of this paper are: 

(1) We expand the classic A:NN attack to a more general case, /3-feNN 
attack, which flexibly adjusts the size of fake user’s set to improve the attack 
effectiveness. /? is essentially regarded as a security measure denoting the 
degree of difficulty for an attacker to break the neighbourhood-based CF 
recommender systems. We are the first to consider the case when /3 > 1. 

(2) To protect users’ data privacy against /3-/cNN attack, we propose 
a novel differential privacy preserving neighbourhood-based CF method, 
which ensures a required prediction accuracy while achieving a better trade¬ 
off between the system security and recommendation accuracy against /cNN 
attack. 

(3) To the best of our knowledge, we are the first to propose a theoretical 
analysis of the recommendation accuracy and system security on the recom¬ 
mendation results from any randomised neighbour selection methods in the 
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neighbourhood-based CF recommender systems. Previous related work only 
gave the experimental analysis on the same issues. 

Organisation. The rest of this paper is organised as follows: In Sec¬ 
tion we summarise both the advantages and disadvantages in the existing 
privacy preserving methods on CF recommender systems. In Section we 
introduce the relevant background knowledge in this paper. In Sectionwe 
introduce an existing attack to neighbourhood-based CF recommender sys¬ 
tems, then expand it to a general case, /3-A:NN attack. Next, We proposed a 
novel differential privacy recommendation approach. Partitioned Probabilis¬ 
tic Neighbour Selection, in Section Afterwards, the theoretical analysis 
of our approach on the performance of both recommendation accuracy and 
system security are provided in Section Then, in Section we show 
the experimental evaluation results. Finally, in Section we conclude this 
paper. 


2 Related Work 

A noticeable number of literature has been published to preserve customers’ 
private data in recommender systems. However, Calandrino et al. pro¬ 
posed a neighbourhood-based CF attack, /cNN attack, which is a serious 
privacy threat to the neighbourhood-based CF recommender systems in e- 
commerce, e.g., Amazon. In this section, we briefly discuss some of the 
research literature in privacy preserving neighbourhood-based CF recom¬ 
mender systems. 


2.1 Traditional Privacy Preserving Recommender Systems 


Amount of traditional privacy preserving methods have been developed in 
CF recommender systems [29| , including cryptographic [11[ 21 , obfusca¬ 
tion 22p7 , perturbation [3]|4]and probabilistic methods . Erkin et al. 11 


applied homomorphic encryption and secure multi-party computation in pri¬ 
vacy preserving recommender systems, which allows users to jointly compute 
their data to receive recommendation without sharing the true data with 
other parties. Nikolaenko et al. combined a famous recommendation 
technique, matrix factorization, and a cryptographic method, garbled cir¬ 
cuits, to provide recommendations without learning the real user ratings 
in database. The Cryptographic methods provide the highest guarantee 
for both prediction security ans system security by introducing encryption 
rather than adding noise to the original record. Unfortunately, unnecessary 
computational cost impacts its application in industry 29 . Obfuscation and 


perturbation are two similar data processing methods. In particular, obfus¬ 
cation methods aggregate a number of random noises with real users rating 
to preserve user’s sensitive information. Parameswaran et al. 22 proposed 


an obfuscation framework which exchanges the sets of similar items before 
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submitting the user data to CF server. Weinsberg et al. introduced extra 
reasonable ratings into user’s profile against inferring user’s sensitive infor¬ 
mation. Perturbation methods modify the user’s original ratings by a se¬ 
lected probability distribution before using these ratings. Particularly, Bilge 
et al. added uniform distribution noise to the real ratings before the util¬ 
isation of user’s rating in prediction process. While, Basu et al. regarded 
the deviation between two items as the adding noise. Both perturbation and 
obfuscation obtain good trade-off between prediction accuracy and system 
security due to the tiny data perturbation, but the magnitude of noise or 
the percentage of replaced ratings are not easy to be calibrated [9,29. The 
probabilistic method [^ applied weighted sampling in neighbour selection 
which preserves users’ privacy against /cNN successfully; however, it cannot 
provide enough accuracy due to its global randomness. Because the per¬ 
formance of the neighbourhood-based CF methods largely depends on the 
quality of neighbours. We suppose the top k neighbour as the highest quality 
neighbour set, the randomised weighted selection process will return neigh¬ 
bours with lower similarity with a high probability. Then the prediction 
accuracy will be impacted significantly [^. Therefore, achieving a trade-off 
between privacy and utility, while calibrating the adding noise are difficult 
tasks for these techniques. 


2.2 Differential Privacy Recommender Systems 

As a well-known privacy definition, the differential privacy technology [^ 
has been applied in the research of privacy preserving recommender sys¬ 


tems. For example, McSherry et al. 19 provided the first differential privacy 
neighbourhood-based CF recommendation algorithm. In fact, their naive 
differential privacy protects the neighbourhood-based CF recommender sys¬ 
tems against A:NN attack successfully, as they added Laplace noise into the 
covariance (similarity between users/items) matrix globally, so that the out¬ 
put k nearest neighbours set is no longer the original top k neighbours. 
However, the global noise decreases the accuracy of their recommendation 
algorithms significantly. 

Another differential privacy neighbourhood-based CF recommender sys¬ 
tems algorithm is proposed by Zhu et al. |29| which inspired this study. It 
aims to provide better prediction accuracy than McSherry et al. 19 while 


aiming to keep differential privacy at both neighbour selection stage and 
rating prediction stage. They proposed a Private Neighbour Collaborative 
Filtering (PNCF) by introducing exponential differential privacy [^ to the 
process of neighbour selection to guarantee the system security against /cNN 
attack. After selecting the k neighbours, same with McSherry et al. [19| , 
they also added Laplace noise into the similarity matrix to make the final 
prediction. 

Unlike the k nearest neighbour method which selects the k most similar 
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candidates, the PNCF method 29 randomly selects the k neighbours with 
each candidate Uj’s weight Wj. According to exponential mechanism of dif¬ 
ferential privacy, the selection weight is measured by a score function and 
its corresponding sensitivity as follow, 

e 




( 1 ) 


where q is the score function, RS is the Recommendation-Aware Sensitivity 
of score function q for any user pairs Ui and Uj, e is differential privacy 
parameter, and U{ua) is the set of user Ua's candidate list. For a user Ua, 
the score function q and its Recommendation-Aware Sensitivity are defined 
as follows: 

qa{U(Ua), Ui) = siruai, (2) 


RS = max <; max 

sGSij 


max 

s^Sii 




(3) 


where rts is user Ui’s rating on item tg, siniai is the similarity between user 
Ua and Ui, ri is user Uj’s average rating on every item, Sij is the set of all 
items co-rated by both users i and j, i.e., Sij = {s G / 0 &: rjg / 0}. 

However, the above naive differential privacy neighbour selection is nearly 
the same to the probabilistic neighbour selection [^. To address the above 
problem of low prediction accuracy in [^ , a truncated parameter A was in¬ 
troduced in . Simply speaking, the candidates whose similarity is greater 
than {sim{a, k) + A) are selected to the neighbour set, while, whose similar¬ 
ity is less than {sim{a, k) — A) will not be selected, where sim(a, k) denotes 
the similarity of user Mq’s fcth neighbour. Theorem 3.1 in provided an 
equation to calculate the value of A, i.e. A = k), 

where p is a constant, 0 < p < 1. 


However, we observe that the above idea in [29] has three weaknesses. 
Firstly, it adds random noise in the process of neighbour selection twice; 
however, it is not necessary. Because we can preserve privacy against /cNN 
successfully only by introducing randomness once, the extra randomness will 
decrease the prediction accuracy significantly. Secondly, the value of A may 
not be achievable. This is because when computing the value of A by p, it 
results in a good theoretical recommendation accuracy, but does not yield 
a good experimental recommendation accuracy on the given test datasets 


m 


29 . So the PNCF method 29 will actually be a method of Global Prob¬ 


abilistic Neighbour Selection (H and cannot guarantee any recommendation 
accuracy. Thirdly, the PNCF scheme breaks differential privacy in the pro¬ 
cess of neighbour selection. Suppose there is a tiny change in the dataset, 
then the value of similarity between target user Ua and other users Ui in 
the candidate list will change. There may exist a user Uc whose probability 
of being selected may change from 0 to x > 0, then the ratio between the 
two probabilities will be 0 or infinite, none of which satisfy Definition in 
Section [ 
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3 Preliminaries 


In this section, we introduce the foundational concepts and mathematical 
model related with this paper in collaborative filtering, differential privacy, 
and Wallenius’ non-central hyper-geometric distribution. 


3.1 k Nearest Neighbour Collaborative Filtering 


A collaborative filtering based recommender system predicts users’ poten¬ 
tial preferences by aggregating the relevant historical data. Collaborative 
hltering, a popular technique in recommender systems, is in three categories: 
neighbourhood-based methods, association rules based methods, and matrix 
factorisation methods [17] . The neighbourhood-based methods generally 
provides recommendations by combining the opinions of a user’s k nearest 
neighbours [^. 

Neighbour Selection and Rating Prediction are two main stages in neighbourhood- 
based CF 


29 . At the Neighbour Selection stage, a target user u^’s neigh¬ 


bours are selected according to their similarity value in the target user Uq’s 
similarity array Sa, where similarities between any two users/items are cal¬ 
culated by a measurement metric. Two of the most popular similarity mea¬ 
surement metrics are the Pearson correlation coefficient and Cosine-based 
Similarity [^. In fcNN method, we select the k most similar neighbours of a 
target user/item. 


(1) Pearson Correlation Coefficient (user-based): 

- ri){rjs - Tj) 


sirriij = 


^JEseS,,iris - EseSi.i'^js - fjf 


(4) 


(2) Cosin-based Similarity (user-based): 


Sira. 




= cos(rj,rj) = 


E 


sGS,;, 


(5) 




2 ’ 
js 


where is user Uj’s rating on item tg, Vig G 77, 77 is the user-item rating 
dataset, sinnj is the similarity between user m and user uj, fi is user ttj’s 
average rating on every item, Sij is the set of all items co-rated by both 
users i and j, i.e., Sij = {s G S\ris / 0 &: rjg ^ 0}. 

At the stage of Rating Prediction in user-based CF methods, the pre¬ 
dicted rating fax of user Ua on item tx is calculated as an aggregation of 
other users’ rating on item tx [2,29 
follow: 


The prediction of fax is computed as 


^ n.T. — 




( 6 ) 
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where, Ni^{ua) is a sorted set which contains user UaS k nearest neighbours, 
Nk{ua) is sorted by similarity in a descending order, sim{a,i) is the ith 
neighbour of Ua in Nk{ua)- 


3.2 Differential Privacy 

Informally, differential privacy [7][^ is a scheme that minimises the sensitivity 
of output for a given statistical operation on two different (differentiated in 
one record to protect) datasets. Specifically, differential privacy guarantees 
no matter whether one specific record appears in a database, the privacy 
mechanism will shield the specific record to the adversary. The strategy of 
differential privacy is adding a random noise to the result of a query function 
on the database. 

To understand the spirit of differential privacy clearly, several items will 
be introduced in advance. Firstly, X(xi, X 2 , • • • , Xn) and X'{x[,X 2 , • • • , x(j) 
are two databases with n entries which differ in only one entry, where Xi 
and x'^ are the ith entry of X and X' . We call X and X' are neighbouring 
dataset. Secondly, f{X) is the query function on database X, the respond 
is the combination of the real answer a = f{X) and a chosen random noise. 
Thirdly, the privacy mechanism 7~, namely, the respond, is computed by 
T{X) = f{X) + Noise. A formal definition of Differential Privacy is shown 
as follow: 


Definition 1 (e-Differential Privacy (zl)- A randomised mechanism T is 
e-differential privacy if for all neighbouring datasets X and X', and for 
all outcome sets S C Range{T), T satisfies: Pr[T(A) G S] < exp(e) • 
Pr[T(A') G S], where e is a privacy budget. 


The privacy budget e is set by the database owner. Usually, a smaller 
e denotes a higher privacy guarantee because the privacy budget e reflects 
the magnitude of difference between two neighbouring datasets. 

There are two main applications of the randomised mechanism 7~: the 
Laplace mechanism and the Exponential mechanism 20 . The definitions 
are shown as below: 


Definition 2 (Laplace Mechanism [T]). Let a query function f: M —)■ 
the e-differential privacy mechanism T obeys that T{X) = f{X) + 

(^, Pr)'^, where the sensitivity of function f, A/ = max|/(A) — f{X')\, 
for all neighbouring datasets X, X' G D”, and d represents the dimension. 

Definition 3 (Exponential Mechanism [^). Given a score function of a 
database X, q{X,x), which reflects the score of query respond x. The expo¬ 
nential mechanism T provides e-differential privacy, ifT{X) = {the prob¬ 
ability of a query respond x oc exp(^^l^^^)}, where Aq = max|g(A, x) — 
q{X',x)\, denotes the sensitivity of q. 
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3.3 Wallenius’ Non-central Hyper-geometric Distribution 

Wallenius’ non-central hyper-geometric distribution is a distribution of weighted 
sampling without replacement. Formally, it is defined as follow [13] : We as¬ 
sume there are c distinct categories in the population, each category contains 
rrii individuals, i G [l,c]. All the individuals from category i have the same 
weight LOi, i G [l,c]. The probability of an individual is sampled at a given 
draw is proportional to its weight cuj. Let Xy = (xiv,X 2 v, ■ ■ ■ ,Xcv) denote 
the total number of the individuals in each colour sampled after the first v 
draws. The probability that the next draw gives a individual of colour i is: 


Pi{v+l){^v) 


{rui - Xiy)uJi 

- Xjv)u}j' 


(7) 


The weighted sampling process without replacement is repeatedly until k 
individuals have been retained, namely, k = where Xi denotes the 

number of individuals sampled from category i by Wallenius’ non-central 
hypergeometric distribution. 

Wallenius 26 proposed the probability mass function for this distribu¬ 
tion in the univariate case (c = 2). Chesson expanded Wallenius’s solu¬ 
tion to the multivariate case (c > 2). In this paper, we focus on the mul¬ 
tivariate Wallenius’ non-central hyper-geometric distribution’s probability 
mass function because we regard one user/item in a recommender system as 
one individual in Wallenius’ non-central hyper-geometric distribution. The 
multivariate probability mass function (PMF) is shown as blow: 


mwnchypg = A{x)I(x), (8) 

where A(a;) = flLi = fo nLi(l “ d = u ■ (m- 

- Xi), X = (xi,X2,... ,Xc), m = {mi,m2., ■ ■ ■,mc), u: = 

{UJI,UJ2, . . .,UJc). 

While in this paper, we mainly use the following properties to evaluate 
different probabilistic relevant approaches. Manly gave the approxi¬ 
mated solution fi* = {fj.'l, fi 2 , ■ ■ ■, 

nl) to the mean fj, = {pi, fj, 2 , ■ ■ ■, Pc) of x after the final draw: 



mi 


l/uil 




(9) 


where Yll=i p* = k, € C : 0 < p* < mi. 

Fog stated the following properties of Equation Q : firstly, the so¬ 
lution p* is valid under the conditions that Vf G C : mj > 0 and uji > 0. 
Secondly, the mean given by Equation is a good approximation in most 
cases. Thirdly, Equation Q is exact when all uii are equal. 


9 




4 A Generalised Privacy Attack for Recommender 
Systems 

In this section, we firstly introduce a popular attack, k nearest neighbour at¬ 
tack, then we expand the concept to a general attack, /3-fe nearest neighbour 
attack. 

4.1 k Nearest Neighbour Attack 

Calandrino et al. stated a user-based attack called k Nearest Neighbour 
(fcNN) attack. Simply, the A:NN attack exploits the property that the users 
are more similar when sharing same rating on corresponding items to reveal 
user’s private data. 

We assume that the recommendation algorithm (feNN CF recommen¬ 
dation) and its parameter k are known to the attacker. Furthermore, the 
attacker’s auxiliary information consists of a target user u^s partial history 
rating values, i.e., he already knows the ratings of m items that Ua has rated. 
Usually, m ~ 8. He aims to catch UaS transactions that he does not yet 
know about. 

To achieve this goal, the attacker firstly creates k fake users who have the 
same ratings with Ua only on the m items. With a high probability, each fake 
user’s k nearest neighbours set Afc(fake user) will include the other k — 1 fake 
users and the target user Ua- Because the target user Ua is the only neighbour 
who has ratings on the items which are not rated by the fake users, to provide 
recommendations on these items to the fake users, the recommender system 
has to give UaS rating to the fake users directly. Obviously, the fake users 
learn the target user UaS whole rating list successfully with /cNN attack. 

4.2 (3-k Nearest Neighbours Attack 

According to the existing privacy preserving neighbourhood-based CF rec¬ 
ommendation methods, we expand the fcNN attack to a more general case, 
named /3-fc Nearest Neighbour (/3-/cNN) attack. 

As we know, to preserve the target user UaS private information against 
A:NN attack, we should avoid selecting the true k nearest neighbours, so the 
existing methods applied the randomness techniques. However, suppose the 
final k neighbours are selected from the top f3k users of Mq’s candidate list, 
also the parameters /3 and k are known to the attacker, the attacker would 
catch Ua’s private data with a high probability by creating j3k fake users. 
When /3 is not great enough, it is still not difficult to break the privacy 
preserving neighbourhood-based CF recommender systems. Therefore, the 
/3-feNN attack can flexibly adjust the size of fake user’s set to improve the 
attack effectiveness. Actually, A:NN attack can be regarded as l-/cNN attack 
in the expanded case of /3-kNN attack. 
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In /3-A;NN attack, /3 can be treated because a security measure as a 
greater value of /? represents a higher fraud cost. We will show the relation¬ 
ship between the prediction utility and f3 in Section 


5 Privacy Preservation by Partitioned Probabilis¬ 
tic Neighbour Selection 

In this section, we firstly provide two performance metrics on the privacy 
preserving neighbourhood-based CF recommender systems against /3-feNN 
attack. Then we propose our Partitioned Probabilistic Neighbour Selection 
algorithm based on the previous analysis. 


5.1 Performance Metrics 

5.1.1 Accuracy Metric 

For any privacy preserving neighbourhood-based CF recommender systems, 
if the sum of similarity of the selected k neighbours is greater, the predicted 
rating value will be better. The reason is simple: the neighbour is closer to 
the target user Ua means the predicted result is more reliable, namely, we 
prefer the method which selects the greater similarity sum. Therefore, we 
dehne the accuracy metric a as the sum of k selected neighbours’ similarity. 

Because we propose a random neighbour selection method, the accuracy 
metric a should be regarded as the expected sum of k selected neighbours’ 
similarity. However, it is not obvious to directly compute the expectation 
of the k neighbours similarity sum: *))> need to 

hnd all the user combinations and corresponding probabilities. So we give 
another way to compute this expectation. 


E( sim{a,i)) = ''^^{sim{a,i)K{xi)) = (10) 

ieNkiua) 


i=l 


i=l 


see Section |3.3| for the definition of Xi and /r*. So we compute the accuracy 
by the following equation in this paper: 


a = sim{a, 


( 11 ) 


Z=1 


5.1.2 Security Metric 

According to the /I-ZcNN attack, suppose the hnal k neighbours are selected 
from the top fik users of UaS candidate list. We assume that the parameters 
/3 and k are known to the attacker, so the attacker would catch UaS privacy 
with a high probability through the same process of /cNN attack by creating 
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/3k fake users. When /3’s value is not great, it is still not difficult to break 
the privacy preserving recommender systems. Therefore, we define /3 as the 
security metric, the greater value of [3 denotes the higher fraud cost for the 
attacker, namely, we want to achieve a trade-off between the security metric 
(3 and a fixed prediction accuracy metric a. 


5.2 Partitioned Probabilistic Neighbour Selection Algorithm 


According to the motivation and previous analysis, we provide an original 
version of our Partitioned Probabilistic Neighbour Selection algorithm. We 
firstly partition the a target user’s candidate list (descending order of sim¬ 
ilarity value) by the given k, then apply a geometric distribution on the 
candidate list to select \p{l — neighbours (apply exponential dif¬ 

ferential privacy in every partition) from partition i until we have a total 
of k neighbours, where integer i G [l,-|-oo), p is a geometric distribution 
parameter. It is clear that our original partitioned probabilistic neighbour 
scheme satisfies property (1) (easy implementation) in Section]^ for it does 
not introduce any extra computational cost. In fact, it is natural to regard 
the low neighbour quality as the noise in the process of neighbour selection, 
since the low neighbour quality has the same impact on the prediction ac¬ 
curacy as the noise. So our method satisfies property (3) (decreasing the 
magnitude of noise) in Section in two ways: 1. it only adds noise in the 
process of neighbour selection. 2. it controls the neighbour quality by tuning 
the geometric distribution parameter p in the process of neighbour selection. 
However, the original version does not satisfy the property (2) (keeping dif¬ 
ferential privacy) and (4) (quantifying the accuracy and security), we now 
show the reasons and modify it to satisfy the property (2) and (4). 

In the original version, we select [p(l — py~^k~\ neighbours with ex¬ 
ponential differential privacy from partition i until we have k neighbours. 
Actually, it breaks differential privacy with the same reason (see details in 
Section 2.2) of the PNCF method 29 . Simply speaking, there may exist 


some users whose probability of selection will be changed from zero to a 
positive number because of a tiny change in rating set. To guarantee the 
prediction accuracy, we only modify the original scheme by changing the way 
we select the last neighbour (see details in next paragraph). The modified 
scheme keeps absolute differential privacy because no matter how we change 
the dataset, every candidate’s probability of selection cannot be zero. To 
quantify the level of recommendation accuracy and system security, we use 
the performance metrics a and (3. We compute the parameter p and the 


security metric /3 by a given a by Equation (20). 


Algorithm shows the Partitioned Probabilistic Neighbour Selection 
(PPNS) algorithm. In lines 1 to 5, we compute the necessary parame¬ 
ters by Equation @, (§, (§ and ( |20[ ). In lines 6 to 18, we select 

the k neighbours by Partitioned Probabilistic Neighbour Selection, then re- 
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turn the target user’s k neighbours and the security metric value (5. We 
firstly mark all of the partitions as unvisited. Next, we select [^(1 — 
neighbours with exponential differential privacy from partition i (mark this 
partition as visited) until we have a total oi k — 1 neighbours. Finally, we 
select the last neighbour from all the unvisited partitions. 


Algorithm 1 Partitioned Probabilistic Neighbour Selection. 

Input: 

Original user-item rating set, TZ] 

Target user, Ua and prediction item, 

Number of neighbours, k; 

Differential privacy parameter, e; 

Accuracy metric, a. 

Output: 

Target user UaS /c-neighbour set, Nk{ua); 

Security metric, /?. 

1: Compute the similarity list for target user Ua, Sa, 

2: Sort Sa in descending order, Sa] 

3: Compute exponential differential privacy sensitivity, RS] 

4: Compute user Uj’s selection weight, a;*; 

5: Compute the geometric distribution parameter, p; 

6: Partition the sorted Sa by k; 

7: for f = 1 to n do 

8: if Neighbour Number ^ k — 1 then 

9: Select [^(1 — py~^k~\ neighbours from partition i to Nk{ua)] 

10: Mark partition i as visited; 

11: Neighbour Number-|-= \p(l — py~^k~\‘, 

12: else 

13: break; 

14: end if 

15: end for 

16: Select one neighbour from unvisited partitions; 

17: [3 = last neighbour’s partition index number; 

18: return Nk{ua), 13; 


6 Theoretical Analysis 

In this section, we use multivariate Wallenius’ non-central hyper-geometric 
distribution to analyse any randomised neighbour selection methods on both 
performance of accuracy and security against /cNN attack theoretically. The 
reason is both multivariate Wallenius’ non-central hyper-geometric distri¬ 
bution and randomised neighbour selection methods are weighted sampling 
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without replacement, the samples are selected one by one from universe, 
and the sampling weight is only depends on each sample’s attribute, i.e., 
the ball’s colour or user’s similarity. 

6.1 Accuracy Analysis 

In this part, to analyse the accuracy performance, we will firstly modify 
the Equation Q to match with a general randomised neighbour selection 
method. As the selection weight in a general probabilistic neighbour selec¬ 
tion method only relies on the user’s similarity, we regard user Wj’s similarity 
sim{a, i) as the sample’s colour in multivariate Wallenius’ non-central hyper¬ 
geometric distribution. Thus in randomised neighbour selection methods, 
nii = 1, c = n, N = ~ Therefore, we rewrite the 

Equation ([^ as: 

A = (1 - = (1 - = ... = (1 _ ( 12 ) 


where A is a constant. 

Now we start evaluating the Partitioned Probabilistic Neighbour Selec¬ 
tion by Equation ( |12[ ). To make it easy, we also partition the candidate list 
in PNCE method [29] and Probabilistic Neighbour Selection by the given 

k. 


Lemma 1. C is an n sized set. We independently sample several samples 
with multivariate Wallenius’ non-central hyper-geometric distribution from 
C twice, suppose and fii are the expected number of sample i from the two 
samplings. Then \/i € [l,n], m > fii 4^ fii- 

Proof. Let Ya=i Ti = A, Er=i fii = X, A = {1- A = (1 - 

(1) Proof of sufficient condition, > fii ^ ^^7=1 h-i > Y7i=i Ti- 
the size of the set C keep the same. 


. . Vi G [1, n], Hi fii ^ h-i P Xyi=l fii- 

(2) Proof of Necessary condition, fii 

According to Equation (12), we have, 

A = (1 - ^ ^i = i- aV^i 


Mi > 


Similarly, k — Y7 j7=i 

■■X = Y2=xl^^>Y2=^T^ 

=> A < A 


= A. 


n 

2=1 


A, and [li fii share the same coi, 


^ (1 - < (1 - 


^ Hi P fii’ 

Therefore, we have Vi G [1, n], Hi > fii ^ YYi=i hi > Yfii=i fii- 
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Lemmashows the fact that when selecting neighbours with multivariate 
Wallenius’ non-central hyper-geometric distribution by several randomised 
neighbour selection methods from a same sized partition, if one method 
selects more neighbours, then the expected number of each neighbour in 
that method is greater too, and vice versa. 


Lemma 2. The method, which selects more users from the first partition 
(contains user ui to u^) of a descending order similarity list, yields a better 
rating prediction, i.e., Yli=i hi > => a > d, where a denotes the 

accuracy metric value. 


Proof. Let Xj ^(2i£groupj ^^i(^groupj 'Yli^group\_ 

Yl!i=i hi- Assume an extreme case: 

Ai > Xi 
Aa < Aa 
As < ^ 

: < : 

.-. Ai - Ai = (Aa - Aa) + (As - As) + • • •. 

It is obvious that, in both sides of the above equation, every item > 0. 
According to LemmaA > A //* > //*, we have, 

Egronpi - fii) = T^group^ “ l^i) + Egroups (hi - Ti) + ' ' ' , and every 

(•)> 0 . 

1 > sim{a,i) > sim{a,j) > 0, (i < j), 

■■EgroupiSim{a,i){^li- fii) > - l^i) 

+ Egroups sim{a,i){fii-Hi) . 

_l_ . . . 

El=i sim{a,i)Hi > El(=i sim{a,i)fii. According to Equation 0 , we 
have a> a. 

Therefore, the method, which selects more users from the first group, is 
more reliable on the predicted rating value. □ 


Theorem 1. Ifp > 1 —the recommendation accuracy performance 
of Partitioned Probabilistic Neighbour Selection is better than PNCF method 
and Probabilistic Neighbour Selection 0 - 


Proof. We firstly demonstrate the best case for the PNCF method and 
Probabilistic Neighbour Selection [^: sim{a,l) = ••• = sim{a,k) = 1 > 
0 = sim{a, k + 1) = ■ ■ ■ = sim{a, n). 

.'. k = kpii + {n — k)fj,n- 

According to Equation (12), A = (1 —= {1 — , Hn = 

1 - (1 - Let A = /ii - /i„ = (1 - - (1 - hi)> then hi = 

^ < ^ + A, namely, /n < ^ + (1 - - (1 - hi), then hi < 
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In PNCF method 


29 


and Probabilistic Neighbour Selection 
while in Partitioned Probabilistic Neighbour Selection, Yliegroupi ~ 
pk. Therefore, according to Lemma 2l when p > 1— a > a, namely, 


the recommendation accuracy of Partitioned Probabilistic Neighbour Selec¬ 
tion is better than PNCF method and Probabilistic Neighbour Selec¬ 
tion . □ 

Since we have qualitatively analysed the recommendation accuracy per¬ 
formance between Partitioned Probabilistic Neighbour Selection and PNCF 
method 29 and Probabilistic Neighbour Selection [^, now we provide the 
quantitative analysis of our Partitioned Probabilistic Neighbour Selection. 
Let ao be the initial accuracy metric. 


sim(a,i)pLi 


sim{a,i) 


fij 

Etii 




k sim{a,i) 


> 0 , 


then we have ^ _ 

Ei=i 1 Ei=i sim(a,i} 

Namely, 


< 


r.— Pk — Ei=i Mi Ei^i sim{a,i)tii 

^ ^ Etii - 

Yli^l sim(a,i)iii+Yl'itk+i sim{a,i)g,i+--- 


an 


EEl sim(a,i) 


Thus, p < 


sim{a,i)' 
«o 


j. Namely, when p > 
racy a must be greater than uq. Therefore, we give the range of p's value 


Op 


the actual accu- 


to guarantee the accuracy metric a > ao, p ^ 


gp 


EEl sim{a,i) 


!]• 


6.2 Security Analysis 

In this section, we firstly provide the range of p, so that our approach guar¬ 
antees the system security against /cNN attack. Next, we present the quan¬ 
titative analysis by providing a relationship between the the probabilistic 
parameter p and the security metric /3. 

In PNCF method 29 , according to Equation Q, the probability mass 
function is: 


.1 n 


PMF = I{x) = / ]J(I - 


2=1 


d = u • (m — x) = — Xj), 


2=1 


where, X = {xi,X 2 , ■ ■ .,Xn), ^ = {oJi,UJ 2 , ■ ■ . ,Wn). 

For the case of selecting the top-/c users, we have: 


(13) 

(14) 


Xi = 


1 fe[i,k] 

0 i ^ [k + l,n] 


(15) 


fJ-i < 
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Thus, the probability of selecting top-A; users in PNCF method 
Probabilistic Neighbour Selection is: 


29 and 


«1 k 

Pr = I{x) = / J|(l - > 0, (16) 

•^0 i=l 

n 

d = LJ • {m — x) = ^ oji. (17) 

i=k+l 

In Partitioned Probabilistic Neighbour Selection, because we actually select 
Ipk] users from the top-/c users, when p < the probability of selecting 
top-A; users as the final k neighbours is 0, namely, we provide the absolute 
security against the A:NN attack when setting p < 

To compute the value of j3 according to our selection process, we select 
p{l — py~^k users from group i, so the first time we select one user from a 
group, the number j of this group obeys the following inequation: 


p{l -py ^A; < I 

• ^ 1 (ln3-ln2)-lnpA: (18) 

J ^ ln{l-p) 


Before the group j + 1, we have selected pk + p(l — p)k + ■ ■ ■ + p{l — py~^k 
users, there are (1 — py~^k users can be selected. Since the each of the 
(1 —py~^k comes from one group, the total number of the groups where the 
k neighbours come from is: 


H = (i - 1) + 

= (j - 1 ) + (1 -P)^ ^k. 


(19) 


6.3 Analysis Results 


According to the previous analysis, when setting the probabilistic parameter 
p as 1 — < P < our Partitioned Probabilistic Neighbour Selec¬ 

tion achieve better performance of recommendation accuracy than Private 
Neighbour Selection [29] and Probabilistic Neighbour Selection . Then we 
give the the relationship between the accuracy metric a and security met¬ 
ric /3 of our Partitioned Probabilistic Neighbour Selection by the following 
equation: 


p e 
^ 3 = 


\ _5a_ 11 

I (In 3—In 2)—Inpfc 
^ ln{l-p) 


(j - 1) + (1 -P)^ ^k 


( 20 ) 


We guarantee to achieve ao accuracy against /3-A:NN attack. 
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6.4 A representative Example 

In this section, we show a simple but representative example of the range of 
the probabilistic parameter p. Suppose k = 6n, 9 G (0,1], we know the lower 
bound of p, 1 — = 1 — (1 — 9)‘^^, is a monotone-increasing function 

of 9. Because the value of 9 is always small(A; G [30, 50] and n is always 
greater than 1000), the value of the lower bound of p will be very small. In 
the mean time, consider the upper bound of p, it would be a number close 
to 1. Therefore, the range of value p is very large in the set of (0,1). 

Now we will show an example in a real scenario. Let k = 50, n = 500, 
e = 1, RS = 1, so the lower bound of p would be 


/n - /500 - 50\ ""*^( 4 x 50 x 1 ) 

V n j “ ^ ( 500 J 


( 21 ) 


and the upper bound of p would be = 0.98. Thus, in the above 

real scenario, when we set p in the range of (0.1,0.98] C [0,1), the Parti¬ 
tioned Probabilistic Neighbour Selection would yield better performance of 
recommendation accuracy against fcNN attack. 


7 Performance Evaluation 


In Section]^ we theoretically analyse the performance on both recommen¬ 
dation accuracy and system security, and prove that to successfully preserve 
customer’s privacy against /cNN attack, our method ensures a better perfor¬ 
mance of recommendation accuracy than the PNCF method and Prob¬ 
abilistic Neighbour Selection . In this section, we compare the recommen¬ 
dation accuracy between Partitioned Probabilistic Neighbour Selection and 
global Neighbour Selection and Probabilistic Neighbour Selection 0by 
the experiments on real world dataset. 

The dataset in the experiments is the MovieLens datase10 The Movie- 
Lens dataset consists of 100,000 ratings (1-5 integral stars) from 943 users 
on 1682 films, where each user has voted more than 20 films, and each film 
received 20—250 users’ rating. Specifically, we randomly select one rating 
of a random user, and then predict this user’s potential value by k Nearest 
Neighbour (/cNN), Partitioned Probabilistic Neighbour Selection (PPNS), 
Probabilistic Neighbour Selection (nPNS) [^, Private Neibgbhour Selection 
Collaborative Filtering (PNCF) [2^ . 

In this paper, we use a famous measurement metric. Mean Absolute 
Error (MAE) , 29 , to measure the recommendation accuracy: 


MAE = ^ ^\rai - fa 

i&T 


( 22 ) 


'^http://grouplens.org/datasets/movielens/ 
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where rai is the real rating of user Ua on item ti, and fai is the predicting 
rating, T is the test times. To guarantee a reasonable experimental result, 
in our experiments, Vai 7 ^ 0. Clearly, a lower MAE value denotes a better 
prediction accuracy. Note that in each experiment, we consider the /cNN 
CF recommendation method as a baseline (the best method on accuracy 
performance). 

In our experiments, we compute the parameter RS by the previous the¬ 
ory [29] . We set T = 10,000, namely, we do the experiments 10,000 times 
to compute the MAE. Specifically, we randomly select one target user and 
item at each time. Our experiments are run on user-based CF (because both 
/cNN attack and /3-A;NN attack are user-based attack), and we use the cosine- 
based metric to compute the similarity between users. Table and Figure 
show the relationship between accuracy performance of Partitioned Prob¬ 
abilistic Neighbour Selection and parameter p, where we set e = 1, /c = 50, 
p = 0.5. Table and Figure show the relationship between security per¬ 
formance of Partitioned Probabilistic Neighbour Selection (value of /3) and 
parameter A:, where the total partition number is 19. Tableand Figure]^ 
show the relationship between accuracy performance of all the four methods 
and parameter k, where we set e = 1, p = 0.5, p = 0.5. Table and Figure 
1^ show the relationship between accuracy performance of PNCF [29] and 
parameter p, where we set e = 1, p = 0.5, k = 50. 



p 


p 

0.1 

0.2 

0.3 

0.4 

0.5 

ANN 

0.6956 

0.7027 

0.6908 

0.6835 

0.7074 

PPNS 

0.8333 

0.7813 

0.7289 

0.7134 

0.7085 

nPNS 

0.8762 

0.8918 

0.8884 

0.8797 

0.8878 

PNCF 

0.8798 

0.8928 

0.8885 

0.8738 

0.8753 

p 

0.6 

0.7 

0.8 

0.9 

1.0 

ANN 

0.6849 

0.6899 

0.6847 

0.6746 

0.6897 

PPNS 

0.6872 

0.6914 

0.6854 

0.6792 

0.6897 

nPNS 

0.8863 

0.8889 

0.8873 

0.8781 

0.8893 

PNCF 

0.8790 

0.8845 

0.8869 

0.8783 

0.8940 


Figure 1: Impacts of p on accuracy 
(e = 1, A; = 50, p = 0.5) 


Table 1: Impacts of p on accuracy 
(e = 1, A = 50, p = 0.5) 
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MAE ^T1 MAE 



/3 


Figure 2: Impacts of p on security 
(total partition number = 19) 


p 

0.1 

0.2 

0.3 

0.4 

0.5 

kNN 

1 

1 

1 

1 

1 

PPNS 

15 

13 

10 

8 

7 

nPNS 

17 

17 

17 

17 

17 

PNCF 

17 

17 

17 

17 

17 

P 

0.6 

0.7 

0.8 

0.9 

1.0 

kNN 

1 

1 

1 

1 

1 

PPNS 

6 

5 

3 

2 

1 

nPNS 

17 

17 

17 

17 

17 

PNCF 

17 

17 

17 

17 

17 


Table 2: Impacts of p on security 
(total partition number = 19) 



k 



Impacts of k on accuracy 
l,p = 0.5, p = 0.5) 



P 


k 

10 

20 

30 

40 

50 

/cNN 

0.8065 

0.7149 

0.7288 

0.6942 

0.6957 

PPNS 

0.8962 

0.8017 

0.7716 

0.7395 

0.7430 

nPNS 

0.9687 

0.9131 

0.9034 

0.8867 

0.8904 

PNCF 

0.9856 

0.9245 

0.9094 

0.8862 

0.8941 

k 

60 

70 

80 

90 

100 

/cNN 

0.6644 

0.6679 

0.6574 

0.6699 

0.6746 

PPNS 

0.7140 

0.7225 

0.7140 

0.7258 

0.7362 

nPNS 

0.8698 

0.8695 

0.8592 

0.8599 

0.8604 

PNCF 

0.8669 

0.8687 

0.8528 

0.8624 

0.8650 

Table 3: 

: Impacts of k on accuracy 



(e 

= 1, p = 

0.5, p = 

0.5) 



P 

0.1 

0.2 

0.3 

0.4 

0.5 

A:NN 

0.6815 

0.6962 

0.6914 

0.6740 

0.6821 

PPNS 

0.7310 

0.7326 

0.7175 

0.7160 

0.7374 

nPNS 

0.8915 

0.8817 

0.8873 

0.8770 

0.8801 

PNCF 

0.8911 

0.8841 

0.8932 

0.8778 

0.8862 

P 

0.6 

0.7 

0.8 

0.9 


/cNN 

0.6725 

0.6887 

0.6808 

0.6910 


PPNS 

0.7107 

0.7289 

0.7185 

0.7196 


nPNS 

0.8657 

0.8812 

0.8701 

0.8813 


PNCF 

0.8667 

0.8837 

0.8738 

0.8733 



Figure 4: Impacts of p on accuracy 
(e = 1, p = 0.5, k = 50) 


Table 4: Impacts of p on accuracy 
(e = 1, p = 0.5. A: = 50) 
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According to the experiments results, we are clear that; 

(1) From Fig. when setting p > 1 — the accuracy perfor¬ 

mance of Partitioned Probabilistic Neighbour Selection is always bet¬ 
ter than the PNCF method [29] and Probabilistic Neighbour Selec¬ 
tion [^. When the value of p is close to 1, the performance of Par¬ 
titioned Probabilistic Neighbour Selection is close to fcNN method. 
Particularly, when p = 1, the Partitioned Probabilistic Neighbour Se¬ 
lection method is the same as feNN method. 

(2) From Fig. [^and Fig. the accuracy performance of the neighbourhood- 
based CF methods largely depends on the quality of neighbours. Our 
Partitioned Probabilistic Neighbour Selection method yields a better 
trade-off between the recommendation accuracy and security, as when 
we offer a better accuracy performance, we do not lose the security 
much. 


(3) From Fig. the size of neighbour set impacts the accuracy per¬ 
formance of all of the neighbourhood-based CF recommendation ap¬ 
proaches. A large value of neighbour set size k yields a better accuracy 
performance. 


(4) From Fig. the value of A in 29 is not achievable because the value 
of p does not impact the accuracy performance of PNCF |29j. 


8 Conclusion 


Recommender systems play an important role in e-commerce. To protect 
users’ private information during the process of filtering, the existing privacy 
preserving neighbourhood-bases CF methods fail to protect users’ privacy 
in rating prediction. The global probabilistic neighbour selection methods, 
such as the PNCF method 29 and Probabilistic Neighbour Selection 
though can protect users’ privacy against /cNN attack successfully, but pro¬ 
vide no data utility guarantee. To overcome the weaknesses of the current 
methods, we propose a novel privacy preserving neighbourhood-based CF 
method. Partitioned Probabilistic Neighbour Selection, to ensure a required 
recommendation accuracy while maintaining high system security against /?- 
A:NN attack (generalisation of fcNN attack). Theoretical and experimental 
analysis show that to provide an accuracy-assured recommendation against 
the most popular attack, /cNN attack, our Partitioned Probabilistic Neigh¬ 
bour Selection method yields a better trade-off between the recommendation 


accuracy and system security than the PNCF methods 29 and Probabilistic 
Neighbour Selection [^. 
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